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.  BSTRACT 


Theil  and  van  de  Panne  have  shown  how  to  replace  the  problem  of 
maximizing  a  (strictly  concave)  quadratic  function  subject  to  linear 
inequality  constraints  by  a  finite  sequence  of  sub-problems  involving 
only  linear  equality  constraints.  In  another  paper,  the  author  general¬ 
ized  this  approach  to  (i)  cover  the  case  of  a  differentiable  and  strictly 
concave  objective  function,  and  (ii)  permit  almost  complete  flexibility 
in  the  choice  of  the  initial  sub-problem.  The  last  feature  seems  essential 
for  the  approach  to  be  of  computational  interest,  for  computational 
experience  suggests  that  the  number  of  sub-problems  that  must  be  solved 
and  the  amount  of  computer  storage  required  to  keep  track  of  them  have 
a  tendency  to  grow  approximately  exponentially  with  the  "poorness  of 
the  choice  of  the  initial  sub-problem. 

In  this  paper  a  modification  of  the  above  approach  is  proposed 
which  generates  the  sub-problems  in  Markovian  fashion.  This  all  but 
eliminates  the  storage  problem.  Although  the  resulting  sequence  of 
sub-problems  is  no  longer  necessarily  finite,  by  means  of  the  theory 
of  Markov  chains  it  is  shown  that  eventual  convergence  to  the  optimum 
is  assured  with  probability  one  and  argued  that  the  expected  number  of 
sub-problems  that  must  be  solved  increases  only  approximatley  linearly  with 
the  "poorness"  of  the  initial  sub-problem.  Computational  evidence  is  given 
which  supports  this  estimate  and  suggests  the  probable  efficiency  of  the 
Markovian  algorithm  even  for  quite  "bad"  choices  of  the  initial  sub-problem. 


This  paper  is  a  seque?  to  a  previous  one  [l]  in  which  the  author 
gave  a  procedure  for  solving  the  problem 

(1)  Maximize  f(x)  subject  to  a^x  <  b^ ,  i=l,...,m, 

where  f  is  a  strictly  concave  and  differentiable  function  that  assumes 
its  unconstrained  maximum.^  The  and  x  are  n-vectors  and  the  b^ 

are  scalars.  It  is  also  assumed  that  (1)  is  feasible,  which  implies  that 

it  has  a  unique  optimal  solution  x* ,  and  that  the  a,  corresponding  to 
the  constraints  that  are  satisfied  with  strict  equality  at  x*  are  linearly 
independent . 

The  procedure  amounts  to  reducing  (1)  to  a  finite  sequence  of  sub¬ 
problems  of  the  form 

(2)  Maximize  f(x)  subject  to  a^x  =  b^ ,  i  e  S  , 

where  S  is  a  subset  of  the  constraint  indices.  Note  that  (2)  involves 
only  linear  equality  constraints,  and  is  therefore  considerably  more 

amenable  to  solution  than  (1) .  The  sequence  of  sub-problems  is  determined 

o  1  K  o  K 

by  a  finite  sequence  S  ,S  , .  .  .  ,S  ,  where  S  is  nearly  arbitrary  and  S 

yields  the  optimal  solution  of  (1).  Rules  are  given  for  determining  S 

o  k- 

given  S  , .  . .  ,S  ,  and  computational  advantage  can  be  taken  (when  (2)  is 

k 

solved)  of  the  fact  that  S  differs  by  only  one  constraint  index  from 
one  of  its  predecessors. 

The  procedure  can  be  viewed  as  a  generalization  of  Theil  and  van  de 

Panne's  algorithm  [ 2 j  for  quadratic  programming.  Aside  from  applicability 

o 

to  a  larger  class  of  problems,  the  essential  generalization  is  that  S  no 

longer  must  be  chosen  to  be  the  empty  set .  This  permits  advantage  to  be 

taken,  by  choosing  S°  appropriately,  of  the  frequent  availability  of 

V  Linear  equality  constraints  ,  which  con  bo  handled  [l]  by  a  simple 

modification  of  the  procedure  much  more  efficiently  than  by  expressing 
them  as  inequalities,  have  been  excluded  from  (1)  for  the  sake  of 
notational  simplicity. 


■-•3 ■ 


2 


prior  (but  possibly  erroneous)  information  regarding  which  of  the  inequality 

constraints  of  (1)  are  actually  restrictive.  In  fact,  with  problems  that 

have  more  than  a  few  constraints  it  is  almost  mandatory  to  use  such  infor- 

o 

mation  to  guide  a  propitious  choice  of  S  ,  for  computational  experience 
[l]  suggests  that  the  total  number  of  sub-problems  that  must  be  solved  and 
the  amount  of  computer  storage  required  to  keep  track  of  them  tend  to 
increase  approximately  exponentially  with  d(S°)  ,  the  "distance"  (to  be 
defined  more  precisely  below)  from  S°  to  a  "true"  set  of  restrictive 
constraints  of  (1). 

The  purpose  of  this  paper  is  to  suggest  how  the  approximately  exponen¬ 
tial  dependence  of  computational  work  on  d(3°)  can  be  ameliorated  to 
approximately  linear  dependence  by  generating  the  sub-problems  in  a 
Markovian  rather  than  deterministic  fashion.  This  strategy  essentially 

eliminates  the  storage  problem,  for  S  will  depend  in  a  very  simple 

k-1 

manner  only  on  S  (it  differs  from  it  by  exactly  one  constraint) .  It 
is  shown  that  eventual  termination  is  assured  with  probability  1  and 
argued  that  the  expected  number  of  sub-problems  to  be  solved  before  term¬ 
ination  should  be  approximately  proportional  to  d(S°)  .  Computational 
experience  tends  to  confirm  this  estimate.  Coefficients  of  proportionality 
of  about  2  were  observed,  which  means  that  for  the  test  problems,  at  least, 
the  Markovian  algorithm  is  quite  efficient  even  whon  d(S°)  is  large. 

In  what  follows,  the  assumptions  of  the  opening  paragraph  are  assumed 
to  hold.  Although  an  effort  has  been  made  to  keep  the  present  paper  self- 
contained  at  least  so  far  as  definitions  are  concerned,  reference  [  1  ] 
should  be  consulted  for  motivation  and  proofs  of  the  unproved  assertations 


below . 
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THE  MARKOVIAN  ALGORITHM 

Denote  by  B  the  set  {i  e  M:  a^x*=b^ )  and  by  A  the  set 
{i  e  M:  u*^X)},  where  M  is  the  set  of  the  first  m  positive  integers 
and  the  u*^  are  the  usual  optimal  "multipliers"  associated  with  (1).  From 
the  Kuhn-Tucker  Conditions,  it  follows  that  the  inclusion  A  c  p,  always 
holds.  A  subset  S  of  M  is  said  to  be  consistent  when  the  linear 


equations  ax  =  b  ,  i  e  S,  are  consistent,  and  independent  when  a  , 
i  i  i 

i  e  S,  are  linearly  independent. 

g 

It  is  known  that  the  optimal  solution  x  of  (2)  exists  and  is  unique 

s  _  ^ 

whenever  S  is  consistent,  and  that  x  =  x*  if  and  only  if  A  c  s  d  B. 

It  is  convenient  to  denote  by  d(S)  the  distance  from  an  arbitrary  sub¬ 
set  S  of  M  to  the  collection  of  subsets  {S'  ^  M:  A  ^  S'  ^  B),  the 
metric  being  the  number  of  indices  in  the  symmetric  difference  set 
(A-S)U(S-B).  Thus  xS=x*  if  and  only  if  d(S)  =  0. 

The  following  procedure  for  solving  (1)  is  called  "Markovian"  because 
Step  2  ensures  that  the  sequence  of  successive  values  for  S  constitutes 


a  Markov  chain. 

o 

Step  0:  Choose  any  initial  consistent  and  independent  S  ,  and  put 

o 

S  equal  to  S  .  Gc  to  Step  la. 

s  s 

Step  la:  Solve  (2)  for  its  unique  optimal  solution  x  .  Put  u^ 

equal  to  0  for  i  e  M-S  and  equal  to  the  unique  solution  of 

v  f(*S>  -  ilsVi 

for  ieS,  where  ^  denotes  the  gradient  operator. 

If  UiS  >  0  for  all  ieS  and  aixS  <  bi  for  ieM-S, 

r  g 

then  terminate:  (x°,u  )  =  (x*,u*).  Otherwise  go  to  Step  2a. 

g 

Step  lb:  Solve  the  following  equation  for  its  unique  solution  z 
ard  then  go  to  Step  2b: 


i-  z  a  +  a  =  0 . 

i  es-i  i  i  i 

o  o 
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Step  2a:  Choose  i  at  random  (with  equal  probability)  from  those 

-  o 

i  that  violated  the  sign  tests  at  Step  la.  If  iQes ,  replace  S 

by  S-i  and  return  to  Step  la;  otherwise,  replace  S  by 
o 

SU  i  and  return  to  Step  la  or  Step  lb  according  as  SU  i  is 

or  is  not  consistent  and  independent. 

Step  2b:  Choose  i  at  random  (with  equal  probability)  from  those 

-  OQ 

g 

i  that  satisfy  z^  <  0.  Replace  S  by  S-i00  and  return  to 

Step  la. 

s  s 

Finding  (x  ,  u  )  at  Step  la  is  equivalent  to  solving  the  Lagrange 
multiplier  equations  associated  with  (2).  Various  suggestions  made  in  [l] 
for  efficient  computational  implementation  carry  over  here. 

It  follows  from  the  results  of  [l]  that  this  procedure,  which  differs 
from  the  original  onlv  in  that  a  randomized  rule  is  used  to  determine  i 

o 

and  i  ,  is  well-defined,  and  that  the  following  lemma  holds, 
oo 

Lemma:  At  Step  2a-^f  d(S  +  i)  «=  d(S)  -1  fcr  at  least  cnc  i  violating  a 

test  at  Step  la.  At  Step  2b,  d(S-i)  =  d(S)  -1  for  at  least  one  i 
S 

satisfying  z^  <0. 

Each  time  Step  1  is  entered,  a  new  iteration  begins.  The  sequence 
of  trial  sets  <S°,S1,...>  generated  by  the  Markovian  algorithm  is  obviously 
a  Markov  chain.  The  subsets  of  M  satisfying  d(S)  =  0  can  be  thought  of 
as  absorbing  states .  In  view  of  the  random  choice  rule  of  Step  2  and  the 
Lemma,  at  least  one  absorbing  state  is  accessible  (in  exactly  d(S°) 
transitions,  in  fact)  from  any  consistent  and  independent  S°.  By  a  basic 
property  of  finite  Markov  chains,  therefore,  we  have  the  following 


2/  S  +  i  denotes  SUi  when  i  f  S,  and  S-i  otherwise. 


Theorem:  The  Markovian  algorithm  terminates  with  probability  1 .— 
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RATE  OF  CONVERGENCE 

In  applications,  of  course,  what  really  matters  is  the  distribution 

of  the  number  of  iterations  before  termination.  We  shall  use  a  simple 

random  walk  model  to  derive  an  estimate  of  the  mean  of  this  distribution 

o 

as  a  function  of  d(S  ). 

For  any  given  problem  (1),  consider  the  (finite)  collection  of  all 
subsets  of  M  that  could  ever  arise  in  the  course  of  executing  the 
Markovian  algorithm.  If  the  largest  value  of  d(S)  over  this  collection 
is  D(D  <  m) ,  then  the  collection  can  be  partitioned  naturally  into 
D  +  1  classes  according  to  the  value  of  d(S)  for  each  set.  From  the 
above  discussion,  it  follows  that  the  transition  matrix  for  the  assoc¬ 
iated  Markov  chain  can  be  schematically  represented  as  in  Figure  1, 
where  the  natural  partition  has  been  used,  the  P  matrices  have  at  least 
one  positive  entry  in  each  row,  the  Q  matrices  are  unspecified,  and  I 
and  0  represent  identity  and  null  matrices.  We  approximate  the  actual 
situation  by  the  simplified  random  walk  model  of  Figure  2,  which  has 
D+l  states  instead  of  D+l  classes  of  states.  The  parameter  p 
represents  the  aggregate  probability  that  a  set  S  will  transit,  by 
an  iteration  of  the  Markovian  algorithm  to  a  set  6 '  satisfying 
d(S’)  =  d(S) -1 . 

By  standard  methods  one  can  derive  the  mean  absorption  times  t^ 
for  the  Markov  chain  represented  by  Fig.  2  given  an  intitial  state 
d(d=l  ,2 . D): 


3/  More  precisely,  to  every  €  >  0  there  exists  a  positive  integer  Ne 

such  that  the  probability  that  termination  has  not  occurred  during  the 

first  N  iterations  is  less  than  e  . 

€ 


r 
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2 
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FIGURE  1 

Transition  Matrix  of  the  Markov  Chain 
Associated  with  the  Markovian  Algorithm 
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FIGURE  2 
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Simplified  Transition  Matrix  of  the  Markov  Chain 
Associated  with  the  Markovian  Algorithm 
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d(2D+l)  -  d2 


for  p  =  1/2 


(3) 


t .  = 


l 


-  (; 


<2p-l)  v2p-l 


i  D-d+1  . 

«ir>  +  ••• +  <ir- 


)  )  for  C  <  p  <  1, 


P  4  \  • 


We  see  that  p  =  -7  is  a  key  value  in  that,  for  fixed  D  and  d,  t 

^  Q 

increases  very  rapidly  as  p  falls  below  1/2  and  decreases  rapidly  to 
quite  small  values  as  p  rises  above  1/2.  For  1/2  <  p  <  1,  (3)  yields 
a  linear  upper  bound  on  t^  that  is  quite  good  for  .6  <  p  <  1: 

(4)  ^  —  2^1  for  1/2  <  p  -  1  and  d=1»-”»D- 

Note  that  this  upper  bound  does  not  involve  D,  that  it  has  zero  inter¬ 
cept,  and  that  its  slope  is  quite  small  for  p  larger  than  .6  or  so. 

This  analysis  suggests  that,  when  p  is  greater  than  .5  on  the 
average,  the  expected  number  of  iterations  before  termination  of  the 
Markovian  algorithm  is  approximately  d(S°)/(2p-l) . 


COMPUTATIONAL  EXPERIENCE 

The  Markovian  algorithm  was  programmed  for  the  IBM  7094  for  the  case 
in  which  f(x)  is  quadratic,  and  tests  were  conducted  on  three  medium¬ 
sized  problems.  Test  problems  1  and  3,  of  practical  origin,  were  20  x  9 
(twenty  variables  and  9  constraints)  and  50  x  25,  respectively.  Test 
problem  2,  10  x  15,  was  methodically  generated  from  a  random  number  table. 
Each  problem  was  run  at  4  arbitrarily  selected  initial  sets  for  each  of 
a  number  of  equally  spaced  values  for  d(S°),  and  the  calculations  were 
done  in  such  a  way  as  to  enable  p  to  be  estimated.  The  estimates  are 
.85,  .84,  and  .78  respectively.  Evidently  the  critical  value  p  =  1/2 
was  amply  exceeded  in  all  of  the  test  problems.  Tables  1,  2,  and  3 
summarize  the  computational  results,  which  tend  to  confirm  the  predicted 


'otal  Number  of  Iterations  before  Termination 


d(S°) 

2 

4 

6 

8 


Run  1 

1 

Run  2 

Run  3 

Run  4 

2 

4 

2 

2 

12 

6 

10 

6 

6 

10 

8 

6 

12 

10 

8 

8 

Avg . 

CHO  ) 

2p~ 

2.5 

2.9 

8.5 

5.7 

7.5 

8.6 

9.5 

1  14.3 

TABLE  1 


Summary  of  Computational  Results  for 
Test  Problem  1  (20  x  9,  p  =  .85) 


d(S°) 

rotal  Numb 

er  of  Itq 

‘rations  before  Ter, 

mination 

►  d(S°) 

2p-l 

Run  1 

Run  2 

Run  3 

Run  4 

Avg. 

2 

2 

6 

2 

8 

4.5 

2.9 

5 

11 

5 

7 

11 

3.5 

7.4 

8 

10 

21 

20 

10 

15.25 

11.8 

11 

17 

15 

23 

11 

16.5 

16.2 

14 

24 

24 

26 

22 

24.0 

20.6 

TABLE  2 


Summary  of  Computational  Results  for 
Test  Problem  2  (10  x  15,  p  =  .04) 


d(S°) 

Total  Number  of  Iterations  oefore  Termination 

d(S°) 

2p~ 

Run  1 

1 

Run  2 

Run  3 

Run  4 

Avg . 

3 

5 

11 

15 

3 

0.5 

5.4 

8 

18 

8 

12 

30 

17.0 

14.3 

13 

27 

21 

27 

15 

22.5 

23.2 

1C 

32 

32 

10 

22 

26.0 

32  .2 

23 

27 

23 

35 

33 

29.5 

41.1 

TABLE  3 


Summary  of  Computational  Results  for 
Test  Problem  3  (50  x  25,  p  =  .70) 


proportional  behavior  for  the  number  of  iterations  as  a  function  of 


d(S°). 

For  each  problem,  the  average  computing  time  per  iteration  was  well 
under  one  second. 

Although  computational  experience  with  three  quadratic  test  problems 
is  hardly  conclusive,  it  is  remarkable  that  the  average  number  of  itera¬ 
tions  should  have  been  observed  so  near  to  the  absolute  minimum,  which  is 
d(S°)  .  Perhaps  variants  of  the  simple  random  choice  rule  of  Step  2  can 
be  devised  to  come  even  closer  to  achieving  that  lower  bound,  as  for 
example  by  weighting  the  probabilities  in  favor  of  constraints  that  are 
in  greatest  violation  of  a  sign  test. 
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1 3  ABSTRACT 

Thell  and  van  de  Panne  have  shown  how  to  replace  the  problem  of  maximizing 
a  (strictly  concave)  quadratic  function  subject  to  linear  Inequality  constraints 
by  a  finite  sequence  of  sub-problems  involving  only  linear  equality  constraints. 
In  another  paper,  the  author  generalized  this  approach  to  (i)  oover  the  case  of 
a  differentiable  and  strictly  concave  objective  function,  and  (ii)  permit  almost 
coaplete  flexibility  in  the  choice  of  the  initial  sub-problea.  The  last  feature 
seems  essential  for  the  approach  to  be  of  computational  interest,  for  comput¬ 
ational  experience  suggests  that  the  number  of  sub-problens  that  must  be  solved 
and  the  amount  of  computer  storage  required  to  keep  track  of  then  have  a 
tendency  to  grow  approxinately  exponentially  with  the  "poorness"  of  the 
choice  of  the  initial  sub-problen. 

In  this  paper  a  modification  of  the  above  approach  is  proposed  which 
generates  the  sub-problens  in  Markovian  fashion.  This  all  but  ellninates  the 
storage  problem.  Although  the  resulting  sequence  of  sub-problems  is  no  longer 
necessarily  finite,  by  means  of  the  theory  of  Markov  chains  it  is  shown  that 
i  eventual  convergence  to  the  optimum  is  assured  with  probability  one  and  argued 
that  the  expected  number  of  sub-problems  that  must  be  solved  increases  only 
approximately  linearly  with  the  "poorness"  of  the  initial  sub-problem. 
Computational  evidence  is  given  which  supports  this  estimate  and  suggests  the 

probable  efficiency  of  the  Markovian  algorithm  even  for  quite  "bad"  choices 
of  the  initial  sub-problem. 
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